[レポート] Use GraphRAG with Amazon Neptune to improve generative AI applications #AWSreInvent

AWS re:Invent 2024
#Amazon Neptune
#AWS
たかくに
2024.12.03
こんにちは！AWS 事業本部コンサルティング部のたかくに（@takakuni_）です。
re:Invent 2024 でラスベガスに来ています。
https://reinvent.awsevents.com/
「Use GraphRAG with Amazon Neptune to improve generative AI applications」という、面白そうなワークショップがあったので参加してみました。
Amazon Neptune 触ったことないので楽しみです。
 セッション概要 タイトルDAT309 | Use GraphRAG with Amazon Neptune to improve generative AI applications
 説明Retrieval Augmented Generation (RAG) applications use the power of generative AI to analyze private, differentiating datasets. However, baseline RAG can sometimes produce poor responses that lack explainability and contextual awareness and include conflated sources and spurious claims. GraphRAG combines knowledge graphs with RAG to produce explainable responses that are grounded in the semantic relationships between concepts, entities, and the underlying content. In this builders’ session, get hands-on experience using Amazon Neptune, graph notebooks, and LlamaIndex (an open source framework for building GraphRAG applications). You must bring your laptop to participate.
 スピーカーDave Bechberger, Principal Graph Architect, AWS
Melissa Kwok, Sr. Neptune Specialist SA, AWS
Taylor Riggan, Principal Graph Architect, Amazon Neptune, Amazon Web Services
Michael Schmidt, Principal Engineer, Amazon (AWS)
Brian O'Keefe, Principal Specialist Solutions Architect, Amazon Web Services
 内容まず初めに GraphRAG とは何なのか。なぜ利用されるのかを学びました。
資料では
Example Corpはウィジェットを販売している
イギリスは最大のウィジェット市場である
ウィジェットは中国からイギリスに出荷されている
イギリスの港で遅延が発生している
といった文章があった場合に、「イギリスでのウィジェット販売の見通しはどうか？」という質問があったとします。
ベクトル検索の場合、質問文に対して「Example Corp はウィジェットを販売している」と「イギリスは最大のウィジェット市場である」が言語的に類似しています。ただし、「ウィジェットは中国からイギリスに出荷されている」や「イギリスの港で遅延が発生している」などの物流問題が抜けています。
より関連性を意識した回答を行うケースで GraphRAG が必要になるようです。
 セットアップワークショップでは SBOM（ソフトウェア部品表）をグラフ化するワークショップでした。諸々のセットアップは LlamaIndex を利用します。
%pip install -q llama-index==0.11.16 llama-index-llms-bedrock llama-index-graph-stores-neptune llama-index-embeddings-bedrock llama-index-readers-file nest-asyncio

# Configure LlamaIndex code
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.core import (
    SimpleDirectoryReader,
    PropertyGraphIndex,
    load_index_from_storage,
    StorageContext,
    Settings,
)

from llama_index.core import Settings
from llama_index.graph_stores.neptune import NeptuneAnalyticsPropertyGraphStore
from IPython.display import display

# Retrieve the configuration information for the notebook
import graph_notebook as gn
config = gn.configuration.get_config.get_config()
host = config.host.split('.')[0]

# Set environment variable for host for Streamlit App
import os
os.environ["HOST"] = host

# Setup nest to allow for reusing the event loop by LlamaIndex
# This is a required step when running in a Jupyter Notebook
import nest_asyncio
nest_asyncio.apply()
次に、グラフを保存する場所として Neptune の登場です。
LlamaIndex と統合してるため非常に少ないコードで繋ぎ込みできるのすごいですね。
graph_store = NeptuneAnalyticsPropertyGraphStore(
    graph_identifier = host
)
最後に LLM の設定です、ワークショップでは Claude 3.5 Sonnet と Amazon Titan Text Embedding v2 を利用しました。
model_id = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
llm = Bedrock(model=model_id)

Settings.llm = llm

embed_model = BedrockEmbedding(model_name="amazon.titan-embed-text-v2:0", additional_kwargs={"dimensions": 256})
Settings.embed_model = embed_model
 インデックスの作成インポート用のセットアップが終わったので、インデックスの作成を行います。
インデックスの作成には次のステップが必要だそうです。
データの読み込み
データの変換
データのインデックス化と格納
 データの読み込みreader = SimpleDirectoryReader(input_dir="/home/ec2-user/SageMaker/Neptune/00-Workshop-Start-HERE/data/")
documents = reader.load_data()
 データの変換と格納ロードしたデータをエンべディングして格納します。
index = PropertyGraphIndex.from_documents(
        documents,
        property_graph_store=graph_store,
        embed_kg_nodes=True,
        show_progress=True,
)
 グラフの作成グラフ操作は openCypher を利用して行います。（はじめて聞きました。）
Notebook を使ったハンズオンなのですが、 task, hint, answer とクイズ形式で面白かったです。
人生ではじめて Neptune を使ったグラフを作成しました。
今までパワーポイントやエクセルくらいしかグラフ作ったことないですが成長できました。
先ほどのグラフはほんの一部で、実はもっとあります。
 GraphRAG を使ったクエリ最後に GraphRAG を使ったクエリを行いました。
query_engine = index.as_query_engine(
    include_text=True,
)
resp = query_engine.query("Where do I need to store SBOM data?")
print(resp)
無事、回答が返ってきていましたね。
SBOMs at Example Corp must be stored in the ESSS (Example SBOM storage system). This is a secure, centralized repository that has access controls and audit logging capabilities. It's important to note that all deployments are required to include SBOM publishing to ESSS before moving from the SIT to UAT environments. Failing to comply with this policy requires an exemption from AppSec, or it will be considered a Sev1 security issue.
 まとめ以上、「[レポート] Use GraphRAG with Amazon Neptune to improve generative AI applications」でした。
GraphRAG 面白そうだなぁと思いつつ、どうして必要なのかから、理解できてよかったです。
データの前処理でどんなものが必要なのかなど調べてみたいと思いました。
AWS 事業本部コンサルティング部のたかくに（@takakuni_）でした！
[レポート] Use GraphRAG with Amazon Neptune to improve generative AI applications #AWSreInvent

セッション概要

タイトル

説明

スピーカー

内容

セットアップ

インデックスの作成

データの読み込み

データの変換と格納

グラフの作成

GraphRAG を使ったクエリ

まとめ

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社